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Abstract 

In compressed sensing sparse solutions are usually obtained by solving an ^^-minimization 
problem. Furthermore, the sparsity of the signal does need not be directly given. In fact, 
it is sufficient to have a signal that is sparse after an application of a suitable transform. In 
this paper we consider the stability of solutions obtained from ^^-minimization for arbitrary 
0 < p < 1. Further we suppose that the signals are sparse with respect to general redundant 
transforms associated to not necessarily tight frames. Since we are considering general 
frames the role of the dual frame has to be additionally discussed. For our stability analysis 
we will introduce a new concept of so-called frames with identifiable duals. Further, we 
numerically highlight a gap between the theory and the applications of compressed sensing 
for some specific redundant transforms. 
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1 Introduction 


Suppose we are interested in the reconstruction of a certain object of interest but we are not 
enabled to observe the signal directly. We are instead allowed to sense it. Mathematically 
speaking, we understand this problem as solving a system of linear equations 


Ax = y, 


( 1 . 1 ) 


where x G M”' represents the object of interest, A G jg the sensing matrix and y G 

represents the resulting observations. It is of course preferred to keep the amount of 
acquired data small, i.e. m should be small. However, if m is (possibly) much smaller than n, 


then (1.1) usually does not have a unique solution in general. This issue can be resolved by 


imposing additional assumptions. One of such possible assumptions is the concept of sparsity. 
We say a signal x is sparse, if the vector x contains only very few non-zero entries. This 
concept of sparsity is fundamental in the field of compressed sensing ig m EZ] where one 


aims to recover sparse signals using only very few measurements. With a view to solve (1.1) a 


compressed sensing reconstruction is then typically computed by solving an equality constrained 
minimization problem of the form 


min f{x) subject to y = Ax, (P) 

X 
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where / : M” —)■ is some non-negative function, the so-called prior-function or regularizer. 

However, it is often not feasible in practice to acquire the data y without any errors or noise 
this means instead of having (1.1) we rather have 

Ax « y 


where we use a ~ 5 to indicate a is close to b but not necessarily equal. Therefore, in order to 
allow perturbations during the measurement process, the model 0 is extended to 

min/(x) subject to \\y — Ax \\2 < e, (P"^) 

X 


where e > 0 is a control parameter for the error that may arise during the acquisition, 
that any element satisfying the constraint in ([Q also satisfies the constraint in (). 

A major question is of course how to choose the prior / : M"" —)• M 
compressed sensing the ideal choice would be 


Note 


In the spirit of 


f{x) = ||x||o := ff{k : Xfc / 0} 

since for this particular choice the problem 0 returns the sparsest solution matching the 
constrained. Unfortunately, for f{x) = ||x||o the i^-problem 0 is NP-hard |22| . A common 
escape is a convex relaxation of the f'*^-problem by using, for example 


f{x) = ||x||i = ^ \Xk\ 

k<n 


in 0. which is also known as basis pursuit and has been investigated widely in [luiidniEiii. 
However, for 0 < p < 1 the ^^-quasi norm 


X 




0 < p < 1 


is closer to the .^^^-function || • ||o than || • ||i is, see Figure Therefore it is of natural interest 
to study problem (|^ for f{x) = ||x||p which has also been done, for instance, in |13l H5l HO] . 



Figure 1: From to Unit balls with respect to the £^-quasi norm for decreasing p. 


The cases 0 and ( |P^[ ) with f{x) = ||a:||p and 0 < p < 1 are usually called the synthesis 
formulation and expect the vector x to be sparse. However, it is observed in many applications, 
that the vector x is only transform sparse, i.e. sparse after an application of a suitable transform 
or often even only compressible, i.e. very few entries are large and the rest is small in modulus 
but not necessarily zero. For example, most natural images are compressible with respect to 
multiscale systems such as wavelets |T^, curvelets |3 or shear lets |3T]. More precisely, the 
transform coefficients decay to zero with a certain rate as the scales increase, see Figure]^ The 
transform sparse model is for instance used in magnetic resonance imaging (MRI) where the 
sparsity of medical images with respect to a wavelet transform is utilized cf. |39) . 
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Figure 2: Left: A natural image of size 2048 x 2048. Right: Distribution of the shearlet 
coefficients of the best A^-term approximation using 5% of the largest coefficients in modulus. 


and (P' 


The transform sparse model leads to the so-called analysis formulation that is problems 0 
with 


fix) 




\\<N / 


0 <p< 1, 


where 4^ : M”" —> M.^,x i—;• i{x,'ip\))x<N is the analysis operator of a spanning system 
for M*". 


1.1 Overview of related work 

Problems of the form © has been studied a lot in the literature for many different type of 
functions /. We will now give an overview of those works that are most relevant and related to 
our work. We start with one of the earliest models in compressed sensing, that is the synthesis 
approach. 


Synthesis approach for 

The synthesis formulation for the ^^-minimization problem is (P^) with /(•) = || • ||i, i.e. 


min||x||i subject to ||y — Ax ||2 < e. 


(£i-P|) 


The minimization problem (^^-P| I is also known as (inequality constrained) basis pursuit and 
is very well studied in compressed sensing. Stabilty results of the form 


k* - x\\2 < CiE + C 2 


\X - Xs 1 




( 1 . 2 ) 


where x* denotes the solution to (f^-P|) and Xs is the best s-term approximation, i.e. the vector 


consisting of the s largest entries of x, are known. This is for example the case, if A satishes 
the so-called restricted isometriy property (RIP) O [20l |22]. We recall its dehnition here for 


convenience. 


Definition 1.1. Let A G 5e a measurement matrix. If there exists 6s such that 

(1 — (5s)||x||2 < ||^ 2:||2 < (1 + (5s)||x||2, (1-3) 

for all s-sparse vectors x, then we say A satishes the RIP of order s with RIP constant 6s. 

Note that there are other properties used in the literature such as the null space property 
|15) (and altered variants) to prove different stabilty estimates, see, for instance, j22) . 
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Synthesis approach for 

It has been noticed that the f^-quasi norms yield stronger emphasis on the sparsity and thus 
in [laiisiEi] the authors have studied the ^^-minimization problem 


min||x||^ subject to \\y — Ax\\ 2 <e. 


(F-P|) 


Again, the results that are fundamental for the user of this minimization problem are stability 
results of the form 


\x — X 


II 2 < Ci{p)eP + C2{p) 


X — X, 


51-P/2 


(1.4) 


where we have used the notations from above. A significant difference to the bound obtained 
in (1.3) is that the constants C'i(p) and C 2 {p) in (1.4) now depend on p G (0,1). However, for 
p = 1 the constants in both cases agreee [la Sa El]. The assumption on the measurement 
matrix A so that (1.4) holds is again based on the restricted isometry property. 

The reader might wonder if such results are of any use in practice as the f^-quasi norms 
turn the problem into a non-convex minimization problem which are in general NP-hard |23| . 
again. However, in the same work |23) the authors showed that local minima can be computed 
in polynomial time by using an interior point method. Further, its improvement over has 
also been numerically demonstrated in that paper. 


Analysis approach for 


Both minimization problems (f^-P|) and (£P-P|) assume the signal x to be directly sparse, 
however, as we have already mentioned in the introduction this is often not the case in many 
practical problems. One possible adaption of the synthesis problem is the analysis formulation 
of the minimization problem (£^-P|), that is 


minllTxlli subject to \\y — Ax\\ 2 <£-, 

X 


(^i-Pi) 


where T is some sparsifying transform associated to a spanning system {'4^x)x<n of This 
problem has initially been studied in [6] and the authors of that work proved stability results 
of the form 


k* — x\\2 < ClE + C 2 


|Tx — (Tx)<j 

7^ 


(1.5) 


for the case that T is the analysis operator associated to a Parseval frame and the measurement 
matrix A satisfies the so-called ^-RIP. 

Definition 1.2. Let T G be a dictionary and A G the measurement matrix. If 

there exists dg such that 

(1 - (j.)||4/*x||i < \\A^*x\\l < (1 + (j.)||4/*x||i, (1.6) 

for all s-sparse vectors x, then we say A satisfies the T-RIP of order s with 4'-RIP constant 


This definition of the T-RIP is the canonical analog to the restricted isometry property 
for direct sparse signals. Since its invention an avalanche of research took place producing a 
wide and fruitful literature on this topic. For instance, the relationship between the analy¬ 
sis formulation and the synthesis formulation has been studied in |18[ 02] as well as possible 
generalizations of the results of |6] to general frames [28l EOl EZl 111 HZ] and not only the case 
where T corresponds to to a Parseval frame. Such generalizations are of great importance as 
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the they show that not all frames are equally well suited for compressed sensing. By that we 
mean the number of measurements might vary siginificantly depending on the frame bounds, 
which is clearly not an effect that enters the argument if one is dealing with Parseval frames. 
This observation already hints that the choice of the sparsifying transform is a very delicate 
problem. We will next discuss what we are interested in in this work as well as what has to be 
carefully considered in some practical problems in that relation. 


Sparsifying transforms associated to arbitrary redundant systems 


In the above stated stability estimates (1.2), (1.4) and (1.5) it is desirable to have a fast decay of 
||x —Xslli, ||x —x<j||p or llTx — ('I'x)slli respectively. When focusing on the latter expression it is 
implied that the transform coefficients ((x,'0 a))a should decrease fast in magnitude in order to 
have a meaningful use of such a stability estimate. Recall that such an estimate was obtained 
using the T-RIP. Therefore the desire of having a fast decay of the transform coefficients is 
needed on the one hand, however, on the other hand we also want to have the T-RIP to be 
fulfilled. The attentive reader will have noticed that there is a mismatch in this assumption. In 
fact, any signal of a given Hilbert space % can be written as 


X = 


^(x,'0a)^/’a = ^(x,V’a)V’a, 


(1.7) 


where (V'a)a is a frame for the Hilbert space % and ('0 a)a is a dual frame, see |14| . In particular, 
we have 




( 1 . 8 ) 


in general for non-tight frames. Equations (1.7) and (1.8) show that if we require a fast decay 
of the coefficients (Ta:)^ = ({x,ip\))s, then we must nost assume the T-RIP but rather the 
T-RIP which is the T-RIP with respect to the dual frame (V'a)a and not the primal frame 
(V’a)a- This rather simple observation can yield a great problem in some cases. For example, it 
is not unusual that although a primal frame is known explicitly as well as a certain behaviour 
of the transform coefficients {{x,^jJ\))\ but a dual (V’a)a is not explicitly given. Moreover, the 
theoretical behaviour of the dual coefficients can be completely different to the primal frame 
coefficients. The shear let transform |26l [351 ESI ET] is for instance a sparsifying transform 
that is often used in certain imaging applications im an as a sparsifying transform but an 
explicit formular of a dual does not exist ~ for the case of compactly supported shear lets, as 
the band-limited ones form a Parseval frame. 

This issue makes an assumption such as the restricted isometry property of the dual frame 
rather artificial. Switching the roles, this means assuming the T-RIP for the primal to hold, 
would on the other hand necessarily yield the minimization over dual frame coefficients. This 
again raises the question how these coefficients behave in general and if a dual is not known, 
then there is no point in minimizing over dual frame coefficients. 

Finally, we want to remark that if the computation of a dual is feasible, one could also 
minimize over all duals in order to obtain the sparsest one. That again is the same as doing the 
synthesis formulation [38]. However, we are interested in the analysis formulation in this paper 
and a discussion of the analysis versus synthesis formulation is not in the focus of this paper, 
for further interested on this matter we refer the reader to m- 


1.2 Contribution 


In this paper we combine (£^-P^ 


and (.^P-Pg) in order to obtain the best of both methods. 


While aiming for stability results we will also resolve the sparsity mismatch explained in the 
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previous section that arises when arbitrary frames are considered. 


Note that there are two 
Either 


fundamentally different possibilities to combine the two problems (I and (^^-P| 
one considers 


min||'I'x||^ subject to \\y — Ax \\2 < e 
with a transform 'I' that is associated to a fixed redundant system (V’a)a or 

min subject to \\y — Ax \\2 < e 


{(P-V 


(F-P 


which corresponds to the minimization over dual frame coefficients. The latter problem suggests 
to assume the T-RIP with respect to the primal frame as then the sparsity pattern matches, cf. 


(1.7). Although stability is then very much expected we shall show that this is indeed the case, 


for the sake of completeness. For (.^^-P^) the situation is different, although it comes without 
any surprise that a stability can again be obtained if one assume the T-RIP with respect to 
the dual frame, but, we shall not do this. More importantly, we show that for certain types 
of frames the assumption of the T-RIP with respect to the primal frame is enough even if 
one minimizes over the primal frame coefficients. The property that enables this approach are 


frames that have an identifiable dual, see Section 2.1 


Our results can be connected to the literature as follows: 


- For p = I our result generalizes the findings of |6] to the case of arbitrary redundant 
systems that are not necessarily a Parseval frame. However, if the system is a Parseval 
frame both results (and constants) agree. 

- If T comes from a Parseval frame, then, due to the fact that all constants are given 
explicitly, we can compare ^^-minimization with .^^-minimization. Therefore, by carefully 
handling the parameters we can obtain smaller constants as for ^^-minimization studied 

in [6]. 

- Frames that have an identifiable dual are, coincidentally, a generalization of scalable frames 
|34) which have arisen in the literature from a completely different perspective. 

Further, in practice it has been observed that the sparsity assumption might be imprac¬ 
ticable, meaning that the sparsity s of the transform coefficients is larger than the dimension 
of the object x that is to be reconstructed, if the transform is truly redundant. We present a 
numerical experiment that shows that this is rather an issue coming from the implementation 
of such transforms than the theory. This experiment in turn implies that there might a large 
gap between the theory and the applications of compressed sensing for redundant dictionaries, 
cf. Section |5] 


1.3 Outline 


In Section we present the new principles that we developed in order to obtain stability results 
for the analysis formulation of ^^’-minimization ) and the minimization over dual coeffi¬ 
cients The proofs of our main results are presented in Section g and Section ^ consists 

of a numerical experiment simulating the applicability and possible benefits of considering 
minimization in practice. In Section we discuss the role of redundancy and how it fits to the 
model of sparsity in compressed sensing for the case of the shearlet transform. 
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Notation 

Throughout this paper T := (V’A)A<Af C % denotes a frame for some Hilbert space %, i.e. there 
exist (positive and hnite) constants ci and C 2 such that 

N 

Cl||x|p < ^ |(x, V'a)!^ < C 2 ||x||^ \/x eH. 

i=l 

The constants ci and C 2 are called lower frame bound and upper frame bound, respectively. If 
Cl = C 2 = 1, then is called Parseval frame. With a slight abuse of notation we also denote 
the analysis operator by 

T : H C^, X ^ ((x, ipx))\<N, 
and the synthesis operator T* by 

T* : —)> PL, {cx)\<N ^ cxifx- 

X<N 

In most cases we will have PL = C”. If x is supposed to be an image in we will consider 

2 

PL = C” the vectorization of the n by n images. 

2 

Note that if the elements ipi, ..., ifxf are given as vectors in C"" , then the synthesis operator 
is simply the matrix T* G C” with columns 'ipx, ^ < LSf. 


2 ^^-minimization: the analysis formulation 

This section contains the main concepts and results of this paper. We proceed by first proposing 
a new concept that is used later on to resolve the sparsity mismatch. 


2.1 Identifiable duals 


Let us recall the issue mentioned in the introduction. In the minimization problem 
we are minimizing over primal frame coefficients {cx)x<N = {{x, '4’x))x<N and suppose we wish 
to prove stability estimates based on the 'L-RIP. Then we notice that the 'L-RIP is based 
on the sparsity of signals in the reconstruction system T = {'iIjx)x<n due to the synthesis 
operation which is the adjoint but not an inverse in the general frame case. More precisely, 
the following problem arises: The (assumed) sparse coefficient vector {cx)x that we obtain from 
the minimization problem does not give rise to a sparse representation with respect to the 
reconstruction system T but rather to a dual T, cf. (1.7) and (1.8). In fact, the dual frame 


coefficients might have a completely different sparsity pattern, 
problem we make the following definition. 


In order to circumvent this 


Definition 2.1. Let I be some index set. We say a frame T = ('0A)Ae/ for PL has an identihable 
dual if there exists a dual frame T sueh that for all x € PL and any A G / the coefficient in 
modulus \{x,'il>x)\ can be bounded from above and below by \{x,'4>x)\> i-O- there exist constants 
di,d 2 > 0 such that 


di\{x,'ipx)\ < \{x,'iIjx)\ < d 2 \{x,'ipx)\ 


( 2 . 1 ) 


for all X € PL and all A G /. 


If T is a frame that has an identifiable dual, then this property ensures that there exists a 
dual such that the sparsity of the primal frame coefficients leads to a sparse representation in 
a dual system. We proceed with discussing its relation to other special frames. 

It is obvious that every tight frame has an identihable dual. More generally, every scalable 


frame (cf. Dehnition 2.2) has an identihable dual. 


7 






Definition 2.2 f|34|i. A frame 4' = {^jJx ■ A G N} forT-L is called scalable if there exists scalars 
ca > 0, A G N such that 


{cxipx : A G N} 

forms a Parseval frame for %. If there exists <5 > 0 such that cj > 0, then we call 'I' positively 
scalable. 


The following result is trivial to prove and we therefore omit its proof. 

Proposition 2.3. Every scalable frame has an identifiable dual. 

The question that naturally arises is whether the two concepts agree. This is indeed not 
the case as the following example shows. 

Example 2.4. It is easy to check that the set of vectors 



is frame for that has an identifiable dual, e.g. 



with di = ^2 = 1 but it is neither a tight nor a scalable frame. The fact that it is not scalable 
follows from Theorem 3.6 of pi] . 


As we will also see later in this article the identifiability is actually not needed for the entire 


space but only to certain elements, cf. Remark 2.9 


Next, we show how this concept relates to the restricted isometry property and in particular 
the number of measurements that are needed in order for it to hold. 


2.2 Restricted isometry property 

The restricted isometry property was first introduced by Candes and Tao in |10] based on 
the assumption that often the signal that is to be reconstructed is sparse. It was then later 
generalized by Candes et al. in [6] to transform sparse signals. This covers a large class of 
signals considered in imaging tasks, since the images considered are often sparse in, for instance, 
a wavelet domain. 

Definition 2.5. Let T G be a dictionary and A G the measurement matrix. If 

there exists 6s such that 

(1 - <5,)||T*x||i < \\A^*x\\l < (1 + <5,)||T*x||i, (2.2) 

for all s-sparse vectors x, then we say A satisfies the 'k-RIP of order s with 'k-RIP constant 
ds. 

The existence of matrices satisfying the T-RIP is given by the following theorem which will 
also be needed for our main result. We move the proof to the appendix as it very close to the 
proof given in pO] . 

Theorem 2.6. Fix a probability measure v on {!,...,A^}, a sparsity level s < N, and a 
constant 0 < (5 < 1. Let T = {f^x)x<N a frame with frame bounds ci and C 2 and let A be an 
n X n matrix whose rows satisfy 

'^rk{i)rj{i)vi = 6j^k- 
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Furthermore, let K be a number such that HV'aII ^ for all X < N and 


L = 


sup 

1'*c||=l 

l|c||o<S 




(2.3) 


If A is an m X n submatrix whose rows are subsampled from A according to u, then there exists 
a C > Q independent of all relevant parameters such that for 


m > 


CK 

- S~^sL‘^ max{log^( sL^) log(A^), log(l/ 7 )} 


(2.4) 


the normalized submatrix y ^ A satisfies the ^-RIP of order s with ^-RIP constant 5 with 
probability 1 — 7 . Furthermore, if the frame possess an identifiable dual with upper constant d 2 . 


then 1/ci can be replaced by ^2 in (2.4), i.e. the result holds for 

m > d 2 CK 6 ~‘^sL^ max{log^(sL^) log(A^), log(l/ 7 )} 


(2.5) 


Theorem 2.6 states that the T-RIP can be guaranteed provided the number of measurements 
scales properly with the sparstity s. In fact, there are some further subtleties that have to be 
considered. First, it is not only the sparsity s that is important but also the localization factor 
L. This factor can be controlled, for instance, if the frame is localized, jl9j . In particular, if 
the Gramian has a strong off-diagonal decay, then L can be further characterized with respect 
to this decay rate as L measures how the Gramian distorts the sparsity structure. Second, the 
dependency of the lower frame bound by 1 /ci can also be controlled by localization arguments. 
Note that by Greshgorin’s Gircle Theorem there exists a A G {!,... ,A^} such that 


1 

Cl 


< kv’a,V’m)i 

/i/A 


and therefore 

Thus frames with strong localization properties such that the dual frame is also strongly local¬ 
ized are of particular interest. It was also proven in |19) that if the primal frame (V'A)AeN is 
polynomially self-localized, i.e. there exists (7 > 0 such that 

^{i + \x- 

for some r G N and all A 7 ^ ;U, then the dual frame (V’A)AeN "'^iH ^Iso be polynomially self- 
localized. This result was extended in |24) to more general geometries, i.e. to other index 
functions than the euclidean distance. 

Theoretically, s can be considered as the dominating value for the number of measurements. 
However, in practice it is a priori not clear that it can always be assumed to be smaller than n. 
In fact, since N is usually much much larger than n the analysis coefficients must be extremely 
sparse to make this result useful. In Section we will demonstrate that this is indeed an issue 
in many imaging problems, but we also show that it has to be further studied from a practical 
point of view and is not really an issue of the theory. 

The last concept that we need before we can prove the stability result is the notion of 
stable frame bounds. This is merely a concept used in order to make all constants and variables 
feasible. It also shows how large s can be in the worst case. 
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2.3 Stable frame bounds 


As we are now not necessarily dealing with tight frames or Parseval frames, the ratio of the 
frame bounds ci /C 2 is not constant and in fact plays a vital role in the estimates for the proof of 
our main results. In order to have some control over this ratio we make the following definition. 

Definition 2.7. We say a frame has ( 7 -controllable frame bounds if there exists aq € {1, ..., N} 
sueh that 


^n2-P) ^ ^ 

^/(2-p) — N' 


( 2 . 6 ) 


Note that (2.6) is, same as the identifiability condition in Section 2.1 always fulfilled for 


tight frames, in particular it holds for every orthonormal basis or Parseval frame. 

We will need the controllability of the frame bound in order to make the implicit assumptions 


on the range of the sparsity s more precise, cf. Theorem 2.8 


Furthermore, the controllability of the frame bounds is also of numerical importance since 
the number of frame elements N is usually very large in practice and in order to have numerical 
stability the frame ratio ci/c 2 should not be too small. In Section 
digital shearlets, |31| . Note that the term on the left-hand side of |2.6 


^ we will verify ( 2 . 6 ) for 
decreases as p increases. 


2.4 Stability for analysis formulation 

Based on the previously introduced principles of identifiable duals and controllable frame 
bounds, we can prove the following theorem that guarantees stability of solutions obtained 
by (). The proof of the theorem is postponed to Section 

Theorem 2.8. Let ^ be a frame that has q-eontrollable frame bounds and has an identifiable 
dual. Moreover, let A be a measurement matrix satisfying the ^-RIP with 5i, < 0.5 where 


V = s- 


p/(p-2) 


and s < 2 d 2 (i+ 2 -p)i/(p- 2 ) d 2 > 0 is the upper eonstant in 


the identifiability eonditions. Then the solution x* of (£^-P^) satisfies 

iTx - (4'x)s||p 


\x — X 


* 11 ^^ 


< Ci{p)eP + C 2 {p)- 


U-p/2 


for some positive constants Ci{p) and C 2 {p) that depend on p, the frame bounds ci,C 2 , the 
^-constant 5^, the sparsity s, the controllability parameter q, and the constants from the iden¬ 
tifiability condition. 


Remark 2.9. The proof of Theorem 2.6 and Theorem \2.^ show that we use the identifiability 
only for the rows of the measurement matrix and x — x*. Therefore, it does not need to hold for 
every element ofC'^. Further, the dual does not have to be known or constructed explicitly. It 
suffices to know that it exists. 


For p = 1 Theorem 2. 
systems 


generalizes the findings of j 6 j to the case of non-tight redundant 


thus our result can be seen as a generalization of the stability result for the analysis 

In particular, if T is a Parseval frame the constants 


formulation of .^^-minimization (^^-P^ 


Ci{p), C 2 {p) agree for p = 1 with those obtained in |^. Furthermore, the constants C'i(p), 6*2 (p) 
appearing in the proof are monotonically decreasing for decreasing p. We state this fact as a 
corollary. 

Corollary 2.10. Let 'k 6e a Parseval frame and let A be a measurement matrix satisfying the 
'^-RIP with Sjs < 0.6 and s < ^ • Then the solution x* of {IP-P^) satisfies 


\x — x 


\\P<Ci{p)eP + C2{p) 


\'Px - (4'x)s||^ 

,l-p/2 
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for some positive eonstants Ci{p) and 6 * 2 (p) that depend on p. Moreover, the eonstants ean be 
chosen to he monotonically decreasing for decreasing p with maximal constants agreeing with 
those obtained in for p = 1. 


For the sake of completeness we also discuss the minimization problem 
suming the 'I'-RIP. 


while as- 


2.5 Stability for dual analysis formulation 

As we already mentioned in Section |2.1[ our motivation for the concept of an identifiable 
dual was to deal with the mismatch between the sparsity system and the coefficients 

which do not give rise to a representation in (V’a)a< 7 V- It is therefore very natural 
to expect when minimizing over the dual frame coefficients, the concept of an identifiable dual 
is not needed. This is indeed the case as the following theorem shows. 


Theorem 2.11. Let A be a measurement matrix satisfuinq the '^-RIP with by < 0.5 where 

V(2-p) /2p\V(2-p) -^ 


V = s 


and s < N { ^ 


. Then the solution x* of (£^’-P^) satisfies 


\x — X 


\\P<Ci{p)eP + C2{p) 




U-p/2 


for some positive constants Ci{p) and C 2 {p) that depend on p, the frame bounds ci,C 2 , the 
^-RIP constant by, and the sparsity s. 


The reader might wonder why the controllability of the frame bounds is not assumed. In fact, 
the role of the controllability parameter q was to determine the range of s. However, loosely 
speaking, the constants are now appearing in the right order which makes the assumptions 
superfluous and the range of s can be determined entirely. 

The proof follows very closely the lines of the proof of Theorem 2.8 We provide a sketch of 
it in Subsection [T31 


3 Proofs 


3.1 Proof of Theorem 12.81 


The proof of Theorem 2.8 is mainly inspired by [ 6 ] and |45j which have their roots in |^. 

Let X, X* be as in the theorem and define z = x — x*. Further, set Tq as the the set consisting 
of the s largest coefficients of Tx in p-th modulus. For any set T, T-r should denote the matrix 
restricted to columns indexed by T. Then we divide Tq into sets Ti,T 2 ,... of size M in order 
of decreasing magnitude of The explicit value of M will be determined later. We shall 

need the following results. 


The first one. Lemma 3.1 below, is a trivial modification of the cone constraint Lemma 2.1 
in [ 6 ]. For completeness we provide a proof. 

Lemma 3.1. The vector Tz obeys the following cone constrained 


Proof. Since x and x* are both feasible and x* is the minizer of (^^-P^) we have 


W^ToxW^ + \\^T§x\\P > ||Tx||P 

= ||Tx - 
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> W'^ToXW^ 




IIP 

lip 


II^To^l 




IIP 

lip- 


□ 


The following lemma generalizes Lemma 2.2 in |6| to the case p G (0,1). For p = 1 the 
results coincide. 

Lemma 3.2. For p = s/M and p = 2\\^t§x\\^/ we have 

+ h)- 

i>2 


Proof. By construction we have that every entry in \{'^Tj+^z)Y’ which will be denoted by 
is bounded as follows 




(fc) 


M 


Therefore 

M 

k=l 

and thus 


Y\\^Fz\\1 < 

i>2 


II'I'Tj^IIp 




Applying Lemma 3.1 and the generalized mean inequality gives 


i>2 


/ S \l-p/2 

\m) 




2||^r^x||?\ 

si-p/2 J ■ 


□ 


Lemma 3.3 ([6], Lemma 2.3). The vector Az satisfies 

\\Az \\2 < 2e. 

Lemma 3.4. Let C 2 be the upper frame bound for the frame iL. Then we have 

{i-6s+Mr/m^To,r'i>Tojr2- 

(C2(l + (^c^^‘^d^fi\z\\l + 77) < {2s)P 

where Tqi := Tq U Ti. 

Proof. W.l.o.g. we assume the identifiable dual is the canonical dual and define T := 
note that is invertible as it is the frame operator. Then by using Lemma [3.3| we have 

{2ey > Pz||^ 

= > \\A{^Toy*'^ToM 

i>2 
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(3.1) 

(3.2) 

(3.3) 







and by the ^'-RIP we have 


(1 - 5s+MY/‘^\\{^T,,r^T,M < 


and 


i >2 j>2 

Recall that by the identifiabihty there exists di , d 2 be such that 

di\{z,'il:x)\ < \{z,Y\)\ < d2\{z,il)x)\ VA < N. 
Using the frame property, the identifiability and Lemma |3 .2 1 yields 




i>2 


i>2 




Combing (3.3), (3.4), (3.5) and (3.9) yields the claim. 
Lemma 3.5. The following inequality is true 


(3.4) 

(3.5) 

(3.6) 

(3.7) 

(3.8) 

(3.9) 

□ 


< 


^^ll^ll2ll(^Toi)*^Toi2:|l2+P^ ^{(^2^\\z\\2+vf 


Proof. With Tqi := TqL) Ti we have by using the frame property and the identifiability of a 
dual 


<^(ll^To,?|l 2 '’+ll'I^T^,^ll 2 '’) 

[{{T,i^ToY*^To,W+\\^T§,Z§ 


< 


< 


^ (^ll^ll2ll('I^Toi)*'I'Toi?||^ + p^-PmToZf2 + df 


where the last inequality follows from Lemma 3.2 


□ 


The proof of the main result can now be derived from the previous lemmas. 
Proof of Theorem \2.^ By Lemma |3.5| we have 


\zfY < — ^ \\{^To,r^To,n^p 


1 


271 


1 fdiWzf^ I ||(TToJ*4^ToJfP 


271 


+ p 2 P{(T/‘^d^\\z\\^ + r]f 

p‘^~P + 2 c|'^^d 2 lkll 2 ^ + Y 


+ 


< 




Ydf Vc?/' 


+ 


271 


+ p^ P [^(Ydf\\z\\f + (^4dl^72\\z\\l^ 
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1 ( 1 (llWzp , , 2 - 

- 


271 


+ p‘^ ^( 0 ^^ 2 ^( 1 + 72 ) 11 ^ 112 ^+ (;;^ + l)7^ 


Therefore 

(1- 


?+ (2+= ^ ^ j j " - cfd? ( 272+^ 


, <?+ 

and in particular 

71 


+ + 


,2-P 


72 


+1• 


1/2 


4d?-i^-^,+P^-^4dfa+72) 


1^11! < 




1/2 


and hence, by using Lemma [3.4| we obtain 

ri(p)||z||f-r2(p)r?<(2e)^’ 

with 


ri(p) = ^27ic5’'^^(l - 6 s+My{ 4 d? - + p"^ + 72))) - ^J{c2dl{l + 5 M)yp^ p 

and 

r2(p) = - ds+M)^p'^~P ~ V (c2di(l + dM)yp^~^- 

It is left to choose 71 , 72 , and p so that ri(p) and r 2 (p) are non-negative. W.l.o.g. we can 
assume ci > 1, otherwise a rescaling can be performed. We now choose 71 = and 

. Then ri(p) is larger than zero and for 72 sufficiently small 


M = s 


2.(l+2-P)i/(p-2)cP/(P-2)d2r/(P-2) 


p/(p-2) 


r 2 (p) is also larger than zero. Furthermore, one can check 

r((p) < 0 and r 2 (p) > 0 
holds for 72 small enough. This completes the proof. 


□ 


3.2 Proof of Corollary 2.10 


The Proof of Corollary |2 .1 0| follows the same lines as the Proof of Theorem 2.8 and we skip the 
details. We only have to verify the statement for regarding the constants. For ci = C 2 = 1 the 
above computations reduce to 

ri(p)lkll2-r2(p)?? < (2e)^ 


with the constants 

ri(p) = ^271(1 -bs+uY (1 - (y +p2-p(l+ ^2))) - a/((1 
r2(p) = y^ 27 i(i - (“ + ^) " \/((i + <^m))V“^’- 

One possibility to guarantee F 1 (p) and F 2 (p) to be positive is to choose 71 = 1 , M = 6 s then 
ri(p) is positive and for 72 sufficiently small r 2 (p) is positive. Moreover, note that 

r((p) < 0 and r 2 (p) > 0 


for 72 small enough. 
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3.3 Proof of Theorem 12.111 


The proof follows the argumentation shown in Section 3.1 We will summarize the results that 
are needed and sketch a proof of the result. 


Proposition 3.6. Retaining the notations and definition Subsection 3.1 the following estimates 
hold 


i) 


a) 




Y,\\^T,zr2<p^-^^"mnzr2+v)- 

i>2 


Hi) 


(1 - - (C2(l + {c7'"\W2 +p]< (2e)" 


iv) 


\z\\f < (ll^llf||(Troi)*^'roi2||^ + ^iW^TozWl + P)‘‘ 


Proof. All properties are proved in the same manner as in Subsection |3.1[ in particular, item 
i), a), and iv) are trivial adaptions and will be skipped. As for item Hi), note that 


{2er > Pz||^ 


> ||A(TToJ*'kToJll! - 


i>2 


> (1 - 6s+Mr7\i^T7*^n7r2 - (1 + E \\7t7^t7\ 

i>2 


By a) we have 


^ Y, \7pz\\l 

i>2 i>2 


i>2 


<(?7p^ '^^‘^{\7toZ\\2+p)- 


Thus the claim follows. 


□ 


Sketch of Proof of Theorem 2.11. By Proposition 3.6 iv) we have 


IzWf < 4 


<4 


7llklir , ) + p^--(||$T„..-|l? + 2||$ro7||S7 + 


+ ■ 


2 271 
' HiWzW^Y IK^T^oJ^^ToizIlf 


+ 


271 


+ P^ 7 II^Toi^lla^ + 4+ il2\7ToZ\7 + — 


72 
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<<5 


7iikiir + m ^ +fi+^ i .>= 


271 


72 


Therefore 


I'j _ c|. _ ||,||^. < + (1 + i) 


which in turn implies 


l2^ 

L 


C^7l p2 P(l + ^2) 


1 -^-'-—I \\z\\p-p^-p/^ 11 + -) V 


r?P 


1 \ 


1/2 


72 / 


< ||('hTo.r'I^ro,^||^. 


Using Proposition 3.6 in) we conclude 

ri(p)||z||^-r2(p)r?<(2e)P 

where 


ri(p) = 




Tt (1 - !|i - d7|±:Mj (1 _ 


^ 1 n 2 -p 

Cl 


and 


r2(p) = y (1 - '5 ^+m)V-p (1 + ^ - y p2-p (1 + 5^)p). 

/ \ l/( 2 -p) 

Thus choosing, for example, 71 < and M = s ( -^ ) yields the result. Note that it is 


possible to choose to choose the constants such that they obey 

tKp) < 0 r2(p) > 0. 


□ 


4 Numerics 


The non-convex minimization problem 


is in general NP-hard although interior point 
methods exist to find local minimima, |23]. The are also other algorithms proposed to solve or 
rather approximate non-convex ^^’-minimization problem and comparison of some methods can 
be found, for instance, in uni. Many of these methods, are based on iterative reweighting which 
originally stems from |12| and has been used to strengthen the effect of sparsity. The ideas 
can be transferred to solve the analysis-based ^^-minimization problem. In the next section we 
present the algorithm that we use to solve (£^-P^). It can also be found in |21) for the synthesis 
formulation. 


4.1 Approximating the ^^-problem by reweighting 


In order to solve the constrained analysis minimization problem (.^^-P^) under the presence 
of noise we use an algorithm that is based on the following intuitive discussion. Solving (.^^-P^), 
that is solving 


minllTxll^ s.t. ||y — Ma ;||2 < e 

X ^ 
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is equivalent to solving 


■ |^2:|a 

mm > -^— 


s.t. 


y - Ax \\2 < e, 


where we neglect for a moment the fact that dividing by |'I'x|a is by no means always math¬ 
ematically legit. Using the idea of reweighted ^^-minimization |12) we wish to strengthen the 
effect of sparsity by using the exact signal xq to consider the weighted problem 

miny^ s.t. \\y — Ax\\ 2 <e. 


If ('I'xo)a = 0, the weight is set as oo. Clearly, multiplying the denominator by some // > 0 
does not change the minimizer, hence, we consider 


mm > " 

X ^ (//I^'xoIa)^ ^ 


s.t. 


\\y - -4x|| 2 < e. 


The purpose of y is to have better numerical control over the magnitude of the analysis coeffi¬ 
cients and should be signal dependent. However, since xq is not available in advance we might 
consider using an approximate vector x^ and consider 


X (/tITx^Ia)^”^ 


s.t. 


||y - Ax \\2 < e. 


(4.1) 


The approximate vector x^ can be obtained by solving (4.1) after an initialization of xq. Finally, 


in order to prevent any instabilities one can introduce v > 0 and solve 


mm 


E 


|Tx|; 


(/tITx^Ia + i^y~p 


s.t. 


- ^a;||2 < e. 


(4.2) 


The final program is then the following. 


Input : y,y.,e,M. 

Output: X 

Initialize k = 0,W = {wx)x = 1. 

while A: < M do 


Find solution of (4.2), i.e. find x^ such that 


x^ = argmin^, llVFTxlli s.t. ||y —^x|| 2 <e. 


Update W by setting 


VFa = 


(;u|tx^Ia + p 


Increase k ^ k + 1. 


end 


(4.3) 


Algorithm 1: Algorithm used to solve the minimization problem {£P-P' 


Note that (|4.2|) and (4.31,respectively, can be solved using many different common .^^-solvers 


such as NESTA ([^) which is available at 

http://statweb.Stanford.edu/~candes/nesta/ 
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4.2 Reconstruction from Fourier measurements using shearlets 

Shearlets were first introduced in |26[ [35] as a frame for the Hilbert space L^(M^) that are 
based on anisotropic scaling, shearing, and translations. It construction is not limited to the 
two dimensional case and the higher dimensional case has been considered in |32| . We shall not 
go into detail of the theoretical concepts of shearlets but refer the reader to [3ll|29l|371[33] for 
theoretical background an implementations of such systems. 

The shearlet implementation used in this paper are downloaded from 

http://www.shearlab.org 

The object of interest is the GLPU phantom introduced in |25| which can be downloaded 
at 


http: //bigwww.epf 1. ch/algorithnis/inriphantoin/#soft 

As already mentioned in the introduction, one of the applications of the analysis-based P- 
minimization problem is magnetic resonance imaging where the sampling process is modeled 
by taking samples of the Fourier transform of the signal. The minimization problem for this 
application is 


min llTxll^ s.t. \\y — iFx \\2 < £ (4.4) 

where T is the shearlet transform and F is the undersampled Fourier transform restricted onto 
a certain subset of frequencies. Typical choices for the set of measured frequencies are samples 
that lie on continuous trajectories such as horizontal lines, spirals, radial lines etc. For our 
numerical experiments we have chosen a radial sampling mask consisting of 30 radial lines, see 
Figure]^ below. 



Figure 3: Left: GLPU Phantom from |25]. Middle: Radial sampling pattern used in Fourier 
domain. Right : Fourier inversion of data. 


We now let Algorithm run for a maximum of 10 iterations, i.e. M = 10 in Algorithm 
The outcome of the relative error is depicted in Figure]^ The curve for p = 1 is not shown 
as it would be a constant line at the first error(~ 17 %). Otherwise, for decreasing p we see a 
decrease with every increasing iteration k of the error curves as p get smaller. It is important to 
mention, that for p = 1 the result corresponds to conventional compressed sensing, i.e. standard 
t'^-minimization. 
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Figure 4: Relative error. 



Figure 5: Left: Reconstruction for p = 0.1. Right: Reconstruction for p = 1. 


Moreover, we like to mention that the other solutions are not obtained by using additional 
information except the computed sparsity structure that is known computed from the previous 
^^-minimization. More precisely, the f^-solution is used to obtain new weights which are now 
a good guess for the sparsity pattern. These weights are then returned to the algorithm so 
that it computes a new solution. In that sense the improvement is for free. However, It is 
computationally much more demanding as we have to solve the minimization problem M = 10 
times. Further, the algorithm takes longer as p gets smaller due to numerical instabilities, cf. 
Figure]^ 

In Figure we plot the relative error for a different number of iterations. Indeed, the hrst 
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3x3 block plots show the nine sequences 




k=l,...,10 


for p = 0.1, 0.2,..., 0.9, 


where 

= argmin^, ||VK'I'x||i subject to \\y — Ax \\2 < e 

with 


(//|'I'x*^|a + vY~'P' 

That is, we show the outcome of Algorithm for allp = 0.1,0.2,...,0.9 and increasing k from 
1 to 10. The second 3x3 block then shows the same sequence but starting from k = 2 for 
better visibility of the error curves 


I ||Tx^+i 


for p = 0.1, 0.2,..., 0.9, 


k=2,...,10 


and the last one shows 


||TXp+^ - Txpl 


for p = 0.1, 0.2, ...,0.9. 


fc=3,...,10 


By plotting the same sequence with k > 2 and k > 3 we can observe that the relative error 
decreases very quickly to zero after very few iterations which suggest fast numerical convergence 
of the algorithm. However, it also shows that such an approach via reweighting yields to less 
stable outcomes for very small p which is strongly visible in the third 3x3 block plot for p = 0.1 
and larger k. The oscillations also show that the choice of a stopping criterion has to be done 
very carefully. We haven’t incorporated any but the maximum number of iterations, which 
yield a termination if this number is reached. 
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Figure 6: Relative error of the sequence of iterations. Starting from k = 1,2, and 3, respectively. 
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4.3 g-controllability of shearlets 


Finally we give a numerical outcome of the q-controllability of the shearlet frame bounds (2.6) 
in Table as this is important in practice. In fact, the numbers show that one can choose 
q ~ 20000 in this particular case. It is important to have this number q as large as possible 


since this in turn allows s to be large in Theorem 2.8 


Scale J 

Cl/C2 

iv-i 

n X n 

2 

0.1152 

7.266e-7 

256 X 256 

3 

0.0889 

3.913e-7 

256 X 256 

4 

0.883 

6.69e-8 

512 X 512 

5 

0.0628 

4.19e-8 

512 X 512 

6 

0.0527 

7.6e-9 

1024 X 1024 


Table 1: Verification of (2.6). In theses cases q can be chosen to be ~ le6. 


However, the numbers presented in Table are very pessimistic and have to be interpreted 
in the right contest as the discussion of the next section will show. 


5 Redundancy versus sparsity in compressed sensing 

In the previous content of this article we have only considered the analysis formulation for 
redundant transforms. This is theoretically necessary, otherwise if the transform corresponds 
to a basis there is no point in distinguishing between the analysis and synthesis formulation 
as these two problem would be equivalent. It has also been observed in applications that 
redundant transforms can yield better results. This is for example the case for the redundant 
wavelet transform in image restoration |46| . From that point of view redundancy greatly helps 
and one might argue that it is also needed or at least desired in certain applications. The 
purpose of this section is to argue that although redundancy seems to yield a great benefit, 
one has to be careful and discuss: How much redundancy is good in practice? Moreover, the 
redundancy factor should be a discussion on its own and should not be confused with the results 
of this paper. 

We now consider two natural images of pixel size 2848 x 2848, shown in Figure and 
demonstrate that typical (redundant) sparsifying transforms such as the wavelet and shearlet 
transform are from a compressed sensing point of view too redundant. 



Figure 7: Reference images 


5.1 Redundant wavelet transform 

We first test the redundant wavelet transform that is made available by van den Berg and 
Friedlander in the spot toolbox which can be downloaded at 
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http://www.cs.ubc.ca/labs/scl/spot/index.html 

We first compute the wavelet decomposition of both reference images shown in Figure for 
different total number of scales J = 2,4,6. Out of these wavelet coefficients we have computed 
the best s-term approximation using s = 90% of the total number of pixels. This should not 
be confused with the number of total coefficients which is much larger. In the notation of our 
previous results we let s = 9/10 • n, where n = 2848^ the dimension of the ambient space. The 
reconstructions are shown in Figure 



Figure 8: 90% of ambient dimension without subsampling (wavelets). Each column represents 
the reconstruction using 2, 4, and 6 scales respectively. 

Obviously, the reconstructions get worse if more scales are available. More interestingly, the 
more coefficients there are available - obtained by increasing the number of scales - the more 
weight does the low frequency part gets in terms of the magnitudes of the coefficients resulting 
in an image that is very blurry. That shows that many more coefficients than the ambient 
dimension are needed in this case. 

Now we conduct the same experiment again except that we do some subsampling before 
we take the best s-term approximation. More precisely, we use the same coefficients but only 
consider every forth wavelet coefficient and set all other coefficients to zero. Thus we divide the 
initial redundancy by a factor of 4. We then take the best s-term approximation using s = 30% 
of the total number of pixels. The outcome can be seen in Figure The reconstructed images 
in Figure]^ are signihcantly better than those obtained in Figure]^ From a theoretical point 
of view this behaviour is expected as the redundancy does not improve the approximation 
rate. However, it also hints that the intrinsic sparsity of the image in the analysis coefficients 
{{x,ip\))x is much much smaller than what we can observe in the s-term approximation. 

As we have already considered the shearlet transform in Section of this paper we next 
want to show that this curse of redundancy can also be observed in that particular case. 
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Figure 9: 30% of ambient dimension with subsampling (wavelets). Each column represents the 
reconstruction using 2, 4, and 6 scales respectively. 

5.2 Redundant shearlet transform 

In this section we show the same experiment but with shear lets instead of wavelets. It is 
important to mention that the shearlet transform is truly redundant in the sense that there 
exists no non-redundant shearlet transform in the literature so far. 



Figure 10: 90% of ambient dimension without subsampling (shearlets) using 2, 4, and 6 scales. 



In Figure 10 we show the best s-term approximation using again 90% of the ambient di¬ 
mension for a 2,4, and 6, level shearlet decomposition, respectively. We again observe that 
the images get more blurry in Figure Similar as for wavelets we can observe that the low 
frequency part gets more and more important if more scales are activated. 

Now we again subsample the already computed shearlet coefficients by only considering 
every forth coefficient and delete all others by setting them to zero. The reconstructions are 
shown in Figure 11 Again, one can observe a significant improvement using only 10% of the 


ambient dimension. Note that by increasing the scales and in that way adding more elements 
to the dictionary the system gets more and more redundant and the reconstructions get more 
and more accurate. This may sound confusing as we observed the opposite behaviour in Figure 
10 There we saw that the redundancy made the s-term approximation worse. 



Figure 11: 10% of ambient dimension with subsampling (shearlets) using 2, 4, and 6 scales. 


It is very evident from both numerical experiments that there is an intrinsic sparsity con¬ 
tained in the analysis coefficients of a redundant transform that is not fully characterized by 
the sparsity assumption alone. One possible approach to tackle this problem from a different 
perspective could be to rely on the statistical dimension, [2]. This is, however, not part of this 
work. Further, note that it is not correct to say, that the compression rate is not strong enough 
to make the results of this paper to work as Figure 11 shows that clearly if one were to use 
a subsampled shearlet transform then the compression rate is good enough for our results to 
apply. 


6 Discussion and future work 

As we discussed in Section 2, it is the authors believe that the minimization problem should be 
performed over dual coefficients instead of transform coefficients if the T-RIP is to be assumed. 
Surely, if the dual system and the primal system give rise to the same sparsity pattern than this 
argument should no longer be valid which was in our analysis demonstrated by the concept of 
identifiable duals. However, the design of sparse representation systems that have good duals 
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is a challenging task in applied harmonic analysis. Nevertheless, it is necessary if one wants to 
combine compressed sensing with redundant dictionaries. 

Furthermore, we want to comment on some problems that are left for future work: 

- In order to find the optimal p the constants Ci (p), 6*2 (p) should be optimized. For this 
one has to optimizing over p, 71,72, M, s. 

- The class of frames that have an identifiable dual can be seen as a generalization of 
scalable frames. Since the latter have nice geometric characterizations j34j via open 
quadratic cones, it is interesting to see whether similar characterizations can be computed 
for frames that have an identifiable dual. 

- Another question left for future work is the replacement of the T-RIP by an isometry 
condition as it has been done for the synthesis model in [7] in order to have a RIPless 
theory for the analysis formulation of the ^^-minimization problem. 

- Also an infinite dimensional scenario could be investigated for the scenario, in partic¬ 
ular, in this case the g-controllability( |2.6[ ) should be replaced by another condition that 
may shed more light on the redundancy problem. 

- The redundancy is a very subtle issue. In Section we have seen in Figure and 
respectively, that there is an intrinsic sparsity structure that has not been captured by the 
theory yet. It also shows that sparsity alone as in the classical sense, is not sophisticated 
enough to explain why the analysis formulation works. In particular, the subsampling 
issue is coming from the implementations of such transforms, in our cases wavelets and 
shearlets, not from the actual theory. However, a possible point of future work is to 
mathematically quantify this redundancy that arises in the discrete setting as presented 
in Section A possible approach is to involve the statistical dimension or other geometric 
properties of the problem, [2l 148] . 
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A Proof of Theorem 12.6 



6s = \\A*A - Idn IIA := sup((A*A - Idn)x, x) 

/6A 


where 


A = {x G ranT* : x = T^c, ||c||o < s, ||x|| < 1} C M"’. 


Since = Idn, we have 



A 



i=l 
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For a Rademacher sequence e = (ej)i independent of {ri)i we have by Lemma 6.7 in |43) 


hence, 


E(5s = —E 

m 


-Er^r*; 


i=l 


2 

< —E 
m 


E 

i=l 


Sir A Vi 


E5^ < 


— E^Ee sup 
m ,,gA 


m 

C^^ir*rix,x) 

i=l 


— ErEe sup 
m 2,gA 


i=l 


Define the pseudo-metric 


1/2 


2i2 


d{x,y) = (Kr*,x)p - \{ri,y)\^) 


\ 2=1 


Then, as shown in |3n) we have for x,y £ A 


\ l/(2p) 


d{x,y) < 2sup ^ |(ri,x - y)|^P j 


\ l/(2p) 


V 2=1 




^ 2=1 

where p,q > 1 such that p~^ + q~^ = 1- 

Now, for any h G A of the form 2; = with ||c||o < s and any realization of (rj)j we have 


|(ri,z)| = |(T(T*T) Vi,TT*c)| 

< E Kd,^a)||('&'I^*c)a 

\<N 

\<N 


where K > 0 is so that HV'aII E K for all A < A and ci denotes the lower frame bound. 
Therefore 

\{ri, h)\ < —Ly/s 
Cl 

with 


L = 


sup 

|'I'*c||=l 

||c||o<s 


||(TT*c)a||i 


Therefore we obtain 


/ m \ 


= sup 
^eA 




l/(2p) 


< S 


V2=l 

KL\ 
Cl ) 


i,z)\ \(ri,z, 


2\ (p-i)/(2p) 


|2p-2 


supEKD,2)|^^j 


\ l/(2p) 


, zSA 


The rest of the proof follows the argumentation given in |3n) . 

For a set S, a metric d and a given t > 0 the covering number AA(S, d, t) is defined as the 
smallest number of balls of radius t centered at points of S necessary to cover S with respect 
to d. By Dudley’s inequality we have 


Eg sup 
xsA 


m 

2=1 


< 4V2 


■\/log(AA(A, d, t) dt. 


(AT) 
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Using the semi-norm 
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2 q 


\ 2=1 


we obtain using covering arguments and (A.l) 

m 

{'^Sir*riX,x) 


Eg sup 

a:)GA 


2=1 


<C s(^) 


2\ (P-1)/(2P) / m 


\ 1/(2?) 


\ poo ! - 


\ C\ J j 

Now, following the arguments in |3n) we have 

[ 'J'^og{J\f{A, II • \\x,q,t) dt < C\Jq{sL‘^)m^/i log(n) log^(sL2). 


Thus in (A.l) we obtain 

2\ (P-1)/(2P) 


C s 


E(5, < 


KL 

Cl 


qm^/isL"^ log(n) log^(sL2) / ^ v i/( 2 p) 

-Esup ( V \{ri,x)\‘^ ) 

xeA ' ^ ' 


m 


\ i=l 


< 


2\(P-l)/(2p) /- 

C'(s(^') I ■\Jqlog{n)log^{sL'^) 


Cl 


\ l/( 2 p) 


< 


C{s{^ 


^l_l/( 2 g)-l/( 2 p) 
2\ (P-1)/(2P) 


E I — II - Idn ||a + II Wn ||a | 


2 = 1 


^qlog{n)log^{sL^) 


vn}!"^ 

We can assume Kjcx to greater than one, hence 

2\ (P-1)/(2P) 

E6s < 


■\/E(i5 -|- 1. 


yglog(n)log2(s ) 


m 


1/2 


Y^EJs -\- 1. 


Choosing p = 1 -|- (log(s(ArL)^C;^ ^)) ^ and g = 1 -|- log{s{KL)‘^Ci yields 

(s(A:L)2q 2)l/2+(p-l)/(2p) < 

hence, 

E<5s < C-\j2 log(n) log^(s(ArL)2c^^)/m\/E(5s -|- 1. 

Finally, 


s{KLYc-^ ^ log(n) log^(sL2) 


E< 5 , < C\ 


m 


provided — logl’^'Uog ^ ^ Therefore, EJ^ < 6/2 for some 5 G (0,1) if 

m > C6~‘^s{KL)'^c//‘^ log^{s{KL)‘^c/‘^) log W (A-2) 
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Let fx,y{r) = Re(((r*rj - ldn)z,w)) so that 


m5. = 


Y1 


i=l 


M 

= sup 

A i=i 


Note that we have 

0 Efx,yiri) = 0 , 

> \fx,y{r)\ < s{KL)^c^‘^ + 1, 

> E\fx,y{r)\^ = E\\{r*ri - U)x\\l < {s{KL)\^^ + if. 


Now, fix some 6 G (0,1) and choose m in accordance with (A.2). Then by Theorem 6.25 of 
we haven 


F{5s >6)< F{6s > E5, + 5/9) 

m 

Y^irtn-Frln) 


i=l 


> E 


Y,{rln-^r*n) 


Z=1 


+ 5m/9 


/ 


< exp 


5m 


9(s{KL)‘2cf+l) 


V 2"" + s{KL)Lf+i) + i 


< exp — 


52 


m 


Cs{KLfc^^J ’ 

where the constant C might changed in the last estimate. Further, if 

m > C5“2s(i<'L)2c/2 log(l/y). 


then (A.31 is bounded by 7. Thus, 6s < 6 with probability 1 — 7 if 

m > C6~‘^s{KLfc^‘^vii&-x.{\o^{s{KLfc^'^)\og{N)fog{l/'^)}. 


(A.3) 


The proof is complete. 
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